Queue Modeling
The data we are using in our analysis consists of 17 weeks of
passenger flight information with which we split the weekdays and
weekends, since the influx of passengers in the airport is significantly
different. Also, we decided to split these clusters into four hour
intervals from 6:00 to 22:00 to study the difference of traffic between
the morning, afternoon, evening and night periods. The first step for
our queuing model is to obtain an arrival rate for each cluster, which
we have tallied in this table:

We obtained the average arrival rate per minute by dividing the
number of arrivals (count) by the number of minutes in the cluster. An
interesting observation, it seems that the arrival rate is higher in the
morning and lower at night.
Now let’s look at the number of servers that were opened for each
time period. We need to keep in mind that, during each time period, the
number of servers at each checkpoint can be adjusted at any moment
(employee breaks, shift changes, queue lengths). However, the recorded
number of servers that we used was recorded every 15 mins which
decreases discrepancies between the actual numbers and the reported
number.

The average number of servers is not matching the tendency that we
noticed stating that the influx is higher in the morning. During the
weekend evenings, the proportion of 3 servers being open is reaching an
all time high of 6.6% even though it relates to the cluster with the
lower arrival rate. Therefore, it would probably be more efficient to
open less servers in the evening and more in the middle of the day but a
possible explanation behind these values not matching would be that the
airport is closing at 22:00 and they need to serve all the customers
before closing the airfield without risking going beyond business
hours.
This leads us to study another concept being the performance of the
servers which is given to us by looking at the proportion of people that
waited less than a certain amount of time:

First of all, the count column of this table is showing a different
number from the count in table 1. This is because we decided to remove
all the people that had no wait times (time spent between S1 and S2) for
this part of the analysis. Indeed, if they didn’t wait, then there is no
point including them in the performance computation because they are not
giving us insights about the servers performance, they probably skipped
the control at S1 and S2. These people were either staff or the main
queue was empty so they have nothing to do with the number of servers
being opened and including these values would bias the final result of
the performance.
Secondly, whether it is for the weekday or the weekend, the clusters
showing a higher average wait time are the morning clusters from 6:00 to
10:00 even though they already have a large number of servers open.
During the week, we even notice that 0.06% of people are waiting more
than 30 minutes which may tell us that the number of servers is not
sufficient.
Now when looking at the M/M/1 queueing model, we can estimate the
probability of waiting up to x units of time using the arrival rate, and
the service rate.

We obtained the estimated service rate using the arrival rate which
is already known and with an average wait time. The estimated ρ is the
traffic intensity of the queuing system, and if this value exceeds 1, it
is not relevant to compute the performance levels which are giving us
the quality of service. We used the M/M/1 queuing model to validate our
raw data, and we can notice that the performance is matching the
previous table.
Let’s make a summary of all the information we got and make a table
out of it.

This gives us now the arrival rate per server along with the service
rate per server which are going to be the variables used in our
regression as we can see in the following plot:

The values fall into our regression line which means that we have a
linear relation between the Arrival rate per line and the service rate
per line.
Using our regression now, we want to confirm the previous values
obtained giving us the following tables:

We obtained the updated service rate along with our regression using
the equation 𝜇 = ac+ b𝜆 and the regression ρ by dividing the arrival
rate by the regression service rate.
We now have new estimated performance using the regression which is
going to help us find the predicted mean number of servers and compare
it with the actual number of servers.

We calculated the predicted mean number of servers by using y = (ac +
b𝜆)x which is equal to the Lambert function. By finding y, we can
further compute c giving us the predicted mean number of servers. This
table is confirming our initial thoughts as we can notice that the
predicted mean number of servers is higher than the actual number of
servers for the morning cluster. This means that it would be clever to
add a fourth server in the morning or keep at least three servers for a
moment to obtain a mean of 1.74 servers. Another efficient choice would
be to reduce the number of servers in the afternoon and then add some in
the evening even if the arrival rate is low. It appears that the actual
number of servers is not sufficient as there is a high number of
predicted servers at certain clusters implying that they should
employ/set more servers as it is predicted to be busier at those
clusters.

We notice that the predicted mean number of servers = 1.0136*(actual
number of servers) where 1.0136 is the regression estimate of the
checkpoint departure d. Computed values of d near 1 for nearly all
checkpoints further validate the combined model.